Newton
Learning-Based Hashing for ANN Search: Foundations and Early Advances
Approximate Nearest Neighbour (ANN) search is a fundamental problem in information retrieval, underpinning large-scale applications in computer vision, natural language processing, and cross-modal search. Hashing-based methods provide an efficient solution by mapping high-dimensional data into compact binary codes that enable fast similarity computations in Hamming space. Over the past two decades, a substantial body of work has explored learning to hash, where projection and quantisation functions are optimised from data rather than chosen at random. This article offers a foundational survey of early learning-based hashing methods, with an emphasis on the core ideas that shaped the field. We review supervised, unsupervised, and semi-supervised approaches, highlighting how projection functions are designed to generate meaningful embeddings and how quantisation strategies convert these embeddings into binary codes. We also examine extensions to multi-bit and multi-threshold models, as well as early advances in cross-modal retrieval. Rather than providing an exhaustive account of the most recent methods, our goal is to introduce the conceptual foundations of learning-based hashing for ANN search. By situating these early models in their historical context, we aim to equip readers with a structured understanding of the principles, trade-offs, and open challenges that continue to inform current research in this area.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Austria > Vienna (0.14)
- (22 more...)
- Education > Educational Setting (0.47)
- Information Technology > Services (0.46)
TravelBench : Exploring LLM Performance in Low-Resource Domains
Billa, Srinivas, Jing, Xiaonan
Results on existing LLM benchmarks capture little information over the model capabilities in low-resource tasks, making it difficult to develop effective solutions in these domains. To address these challenges, we curated 14 travel-domain datasets spanning 7 common NLP tasks using anonymised data from real-world scenarios, and analysed the performance across LLMs. We report on the accuracy, scaling behaviour, and reasoning capabilities of LLMs in a variety of tasks. Our results confirm that general benchmarking results are insufficient for understanding model performance in low-resource tasks. Despite the amount of training FLOPs, out-of-the-box LLMs hit performance bottlenecks in complex, domain-specific scenarios. Furthermore, reasoning provides a more significant boost for smaller LLMs by making the model a better judge on certain tasks.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- (5 more...)
Optimizing F-Measures by Cost-Sensitive Classification
Shameem Puthiya Parambath, Nicolas Usunier, Yves Grandvalet
We present a theoretical analysis of F -measures for binary, multiclass and mul-tilabel classification. These performance measures are non-linear, but in many scenarios they are pseudo-linear functions of the per-class false negative/false positive rate. Based on this observation, we present a general reduction of F - measure maximization to cost-sensitive classification with unknown costs. We then propose an algorithm with provable guarantees to obtain an approximately optimal classifier for the F -measure by solving a series of cost-sensitive classification problems. The strength of our analysis is to be valid on any dataset and any class of classifiers, extending the existing theoretical results on F -measures, which are asymptotic in nature. We present numerical experiments to illustrate the relative importance of cost asymmetry and thresholding when learning linear classifiers on various F -measure optimization tasks.
- North America > United States > Massachusetts > Middlesex County > Newton (0.04)
- Europe > France > Hauts-de-France (0.04)
- Europe > Bulgaria > Sofia City Province > Sofia (0.04)
- Asia > Taiwan (0.04)
- Europe > United Kingdom > England > Bristol (0.04)
- North America > United States > Massachusetts > Middlesex County > Newton (0.04)
MODE: Mixture of Document Experts for RAG
Retrieval-Augmented Generation (RAG) often relies on large vector databases and cross-encoders tuned for large-scale corpora, which can be excessive for small, domain-specific collections. We present MODE (Mixture of Document Experts), a lightweight alternative that replaces fine-grained nearest-neighbor search with cluster-and-route retrieval. Documents are embedded, grouped into semantically coherent clusters, and represented by cached centroids. At query time, we route to the top centroid(s) and retrieve context only within those clusters, eliminating external vector-database infrastructure and reranking while keeping latency low. On HotpotQA and SQuAD corpora with 100-500 chunks, MODE matches or exceeds a dense-retrieval baseline in answer quality while reducing end-to-end retrieval time. Ablations show that cluster granularity and multi-cluster routing control the recall/precision trade-off, and that tighter clusters improve downstream accuracy. MODE offers a practical recipe for small and medium corpora where simplicity, speed, and topical focus matter.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.51)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)
Bridging Language Models and Financial Analysis
Lopez-Lira, Alejandro, Kwon, Jihoon, Yoon, Sangwoon, Sohn, Jy-yong, Choi, Chanyeol
The rapid advancements in Large Language Models (LLMs) have unlocked transformative possibilities in natural language processing, particularly within the financial sector. Financial data is often embedded in intricate relationships across textual content, numerical tables, and visual charts, posing challenges that traditional methods struggle to address effectively. However, the emergence of LLMs offers new pathways for processing and analyzing this multifaceted data with increased efficiency and insight. Despite the fast pace of innovation in LLM research, there remains a significant gap in their practical adoption within the finance industry, where cautious integration and long-term validation are prioritized. This disparity has led to a slower implementation of emerging LLM techniques, despite their immense potential in financial applications. As a result, many of the latest advancements in LLM technology remain underexplored or not fully utilized in this domain. This survey seeks to bridge this gap by providing a comprehensive overview of recent developments in LLM research and examining their applicability to the financial sector. Building on previous survey literature, we highlight several novel LLM methodologies, exploring their distinctive capabilities and their potential relevance to financial data analysis. By synthesizing insights from a broad range of studies, this paper aims to serve as a valuable resource for researchers and practitioners, offering direction on promising research avenues and outlining future opportunities for advancing LLM applications in finance.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > Middle East > Jordan (0.04)
- (2 more...)
- Overview (1.00)
- Research Report > New Finding (0.45)
WorldModelBench: Judging Video Generation Models As World Models
Li, Dacheng, Fang, Yunhao, Chen, Yukang, Yang, Shuo, Cao, Shiyi, Wong, Justin, Luo, Michael, Wang, Xiaolong, Yin, Hongxu, Gonzalez, Joseph E., Stoica, Ion, Han, Song, Lu, Yao
Video generation models have rapidly progressed, positioning themselves as video world models capable of supporting decision-making applications like robotics and autonomous driving. However, current benchmarks fail to rigorously evaluate these claims, focusing only on general video quality, ignoring important factors to world models such as physics adherence. To bridge this gap, we propose WorldModelBench, a benchmark designed to evaluate the world modeling capabilities of video generation models in application-driven domains. WorldModelBench offers two key advantages: (1) Against to nuanced world modeling violations: By incorporating instruction-following and physics-adherence dimensions, WorldModelBench detects subtle violations, such as irregular changes in object size that breach the mass conservation law - issues overlooked by prior benchmarks. (2) Aligned with large-scale human preferences: We crowd-source 67K human labels to accurately measure 14 frontier models. Using our high-quality human labels, we further fine-tune an accurate judger to automate the evaluation procedure, achieving 8.6% higher average accuracy in predicting world modeling violations than GPT-4o with 2B parameters. In addition, we demonstrate that training to align human annotations by maximizing the rewards from the judger noticeably improve the world modeling capability. The website is available at https://worldmodelbench-team.github.io.
- North America > United States > Massachusetts > Middlesex County > Newton (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
Optimizing F-Measures by Cost-Sensitive Classification
Shameem Puthiya Parambath, Nicolas Usunier, Yves Grandvalet
We present a theoretical analysis of F -measures for binary, multiclass and multilabel classification. These performance measures are non-linear, but in many scenarios they are pseudo-linear functions of the per-class false negative/false positive rate. Based on this observation, we present a general reduction of F - measure maximization to cost-sensitive classification with unknown costs. We then propose an algorithm with provable guarantees to obtain an approximately optimal classifier for the F -measure by solving a series of cost-sensitive classification problems. The strength of our analysis is to be valid on any dataset and any class of classifiers, extending the existing theoretical results on F -measures, which are asymptotic in nature. We present numerical experiments to illustrate the relative importance of cost asymmetry and thresholding when learning linear classifiers on various F -measure optimization tasks.
- Europe > France > Hauts-de-France > Oise > Compiègne (0.04)
- North America > United States > Massachusetts > Middlesex County > Newton (0.04)
- Europe > Bulgaria > Sofia City Province > Sofia (0.04)
- Asia > Taiwan (0.04)
Beyond Factual Accuracy: Evaluating Coverage of Diverse Factual Information in Long-form Text Generation
Samarinas, Chris, Krubner, Alexander, Salemi, Alireza, Kim, Youngwoo, Zamani, Hamed
This paper presents ICAT, an evaluation framework for measuring coverage of diverse factual information in long-form text generation. ICAT breaks down a long output text into a list of atomic claims and not only verifies each claim through retrieval from a (reliable) knowledge source, but also computes the alignment between the atomic factual claims and various aspects expected to be presented in the output. We study three implementations of the ICAT framework, each with a different assumption on the availability of aspects and alignment method. By adopting data from the diversification task in the TREC Web Track and the ClueWeb corpus, we evaluate the ICAT framework. We demonstrate strong correlation with human judgments and provide comprehensive evaluation across multiple state-of-the-art LLMs. Our framework further offers interpretable and fine-grained analysis of diversity and coverage. Its modular design allows for easy adaptation to different domains and datasets, making it a valuable tool for evaluating the qualitative aspects of long-form responses produced by LLMs.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > Austria > Salzburg > Salzburg (0.04)
- (9 more...)
Accelerated Bayesian parameter estimation and model selection for gravitational waves with normalizing flows
Polanska, Alicja, Wouters, Thibeau, Pang, Peter T. H., Wong, Kaze K. W., McEwen, Jason D.
We present an accelerated pipeline, based on high-performance computing techniques and normalizing flows, for joint Bayesian parameter estimation and model selection and demonstrate its efficiency in gravitational wave astrophysics. We integrate the Jim inference toolkit, a normalizing flow-enhanced Markov chain Monte Carlo (MCMC) sampler, with the learned harmonic mean estimator. Our Bayesian evidence estimates run on $1$ GPU are consistent with traditional nested sampling techniques run on $16$ CPU cores, while reducing the computation time by factors of $5\times$ and $15\times$ for $4$-dimensional and $11$-dimensional gravitational wave inference problems, respectively. Our code is available in well-tested and thoroughly documented open-source packages, ensuring accessibility and reproducibility for the wider research community.
- North America > United States > Massachusetts > Middlesex County > Newton (0.04)
- Europe > Netherlands (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)